Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost
نویسندگان
چکیده
OF THE DISSERTATION Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost by Zhixiang (Eddie) Xu Doctor of Philosophy in Computer Science Washington University in St. Louis, 2014 Research Advisor: Professor Kilian Q. Weinberger, Chair The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off. We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies ix to explicitly trade-off accuracy and the two components of test-time cost during classifier training. To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser [132] and Anytime Representation Learning (AFR)[135]. GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) [110] into testtime cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance. We then introduce Cost Sensitive Tree of Classifiers (CSTC)[134] and Cost Sensitive Cascade of Classifiers (CSCC)[137], which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets. To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose testtime evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and
منابع مشابه
A Multi-Mode Resource-Constrained Optimization of Time-Cost Trade-off Problems in Project Scheduling Using a Genetic Algorithm
In this paper, we present a genetic algorithm (GA) for optimization of a multi-mode resource constrained time cost trade off (MRCTCT) problem. The proposed GA, each activity has several operational modes and each mode identifies a possible executive time and cost of the activity. Beyond earlier studies on time-cost trade-off problem, in MRCTCT problem, resource requirements of each execution mo...
متن کاملA multi-objective resource-constrained optimization of time-cost trade-off problems in scheduling project
This paper presents a multi-objective resource-constrained project scheduling problem with positive and negative cash flows. The net present value (NPV) maximization and making span minimization are this study objectives. And since this problem is considered as complex optimization in NP-Hard context, we present a mathematical model for the given problem and solve three evolutionary algorithms;...
متن کاملA mathematical model for the multi-mode resource investment problem
This paper presents an exact model for the resource investment problem with generalized precedence relations in which the minimum or maximum time lags between a pair of activities may vary depending on the chosen modes. All resources considered are renewable. The objective is to determine a mode and a start time for each activity so that all constraints are obeyed and the resource investment co...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملOptimization of single outsourcer–single subcontractor outsourcing relationship under reliability and maintenance constraints
In this paper, we focus on outsourcing activities optimization problem in single period setting. In some situations, capacity planning or outsourcing is a one-time event and can be modeled as a single period problem. The aim of this research is to balance the trade-off between two echelons of a supply chain consisting of a single outsourcer and a single subcontractor. Each part is composed of a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015